Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use orjson to de-/serialize json-rpc messages #2513

Merged
merged 4 commits into from
Sep 30, 2024

Conversation

deathaxe
Copy link
Contributor

After a recent discussion about json serialization performance, I was curious about impact orjson makes on LSP processing performance.

This PR showcases how orjson as transport de-/serializer.

During recent 2 weeks of daily use, I haven't realized any issues, but also my projects are small enough to not realize any significant impact on overall performance.

Maybe someone using heavy language servers such as typescript can try to benchmark it.

@predragnikolic
Copy link
Member

It can be tested with this package that Rafal wrote
https://github.com/sublimelsp/LSP-performance-test

@deathaxe
Copy link
Contributor Author

According to this plugin, orjson doesn't make any difference. Tests results in 0.189 - 0.192 with both json and orjson on my machine.

@predragnikolic
Copy link
Member

This thread has some information of where bottle necks are located, and how they were solved
#2190

@deathaxe
Copy link
Contributor Author

deathaxe commented Sep 14, 2024

Yeah, and those include json serialization performance being an aspect. With promised performance of orjson I would have expected a little improvement with regards to deserialize messages from language servers. Maybe payload is just too small to outweight overhead for python->C/rust->python translations.

@rchl
Copy link
Member

rchl commented Sep 14, 2024

That test tests performance of our format_completion and not json deserialization. It would have to be modified.

@rchl
Copy link
Member

rchl commented Sep 29, 2024

I've added relevant tests in https://github.com/sublimelsp/LSP-performance-test

It does appear to be twice as fast:

orjson
Best of 3: 0.09664862500000027 usec. All times: [0.0971984999999993, 0.09785637499999922, 0.09664862500000027]

json:
Best of 3: 0.197854125000001 usec. All times: [0.2019498329999987, 0.19887404199999992, 0.197854125000001]

@deathaxe are you able to get this PR into a state that passes CI?

@rchl
Copy link
Member

rchl commented Sep 29, 2024

For the record, that was parsing of a 50MB payload.

Here are also results for parsing a tiny payload 10000 times:

json
Best of 3: 0.02308479100000227 usec. All times: [0.023518416999991132, 0.02308479100000227, 0.023143875000002367]

orjson
Best of 3: 0.0058567079999996 usec. All times: [0.006264167000001208, 0.006002708000000467, 0.0058567079999996]

Seems like orjson is still faster.

rchl and others added 2 commits September 29, 2024 13:56
Decide to use orjson or json on runtime, because

1. is something pyright linter can handle
2. works without touching unit tests
@deathaxe
Copy link
Contributor Author

deathaxe commented Sep 29, 2024

No idea how to satisfy pyright's limitations.

I would prefer an import-time decision about using orjson or json. But pyright doesn't get that right.

It doesn't even resolve orjson import.

@rchl
Copy link
Member

rchl commented Sep 29, 2024

Add orjson==3.10.7 in:

plugin/core/transports.py Outdated Show resolved Hide resolved
@rchl
Copy link
Member

rchl commented Sep 29, 2024

I would prefer an import-time decision about using orjson or json. But pyright doesn't get that right.

Probably good enough as is. Getting the types right otherwise does indeed seem tricky.

@jwortmann
Copy link
Member

jwortmann commented Sep 29, 2024

orjson
Best of 3: 0.09664862500000027 usec. All times: [0.0971984999999993, 0.09785637499999922, 0.09664862500000027]

json:
Best of 3: 0.197854125000001 usec. All times: [0.2019498329999987, 0.19887404199999992, 0.197854125000001]

For the record, that was parsing of a 50MB payload.

Just for comparison, the time for a single frame on a 120 Hz display is 8333.3 µs, so the performance improvement of parsing a 50 MB payload would make around 0.0012 % of a single frame time on a 120 Hz display. Not sure if that is worth to introduce another dependency...

Also, such big payloads only seem to happen for completions, e.g. with LSP-typescript, where the server response for 36 MB already takes 2.9 seconds (see #2190). Then there is the time needed to create the CompletionItem objects and to pass them over the ST API, so we're probably at 3+ seconds. If the values from above are correct, we can now save 0.0000001 seconds from JSON parsing (for 50 MB). It doesn't look like to me that this would be the bottleneck...

Edit: #2190 (comment) says current JSON parsing takes 0.293 seconds, so there is probably an error in your reported times? Should it be seconds instead of usec?


The Python code could be simplified to single lines:

return orjson.loads(message) if orjson else json.loads(message.decode('utf-8'))

@deathaxe
Copy link
Contributor Author

The Python code could be simplified to single lines:

That's just another flavor, but no real simplification. Actually such one-liners are often harder to read and don't execute faster by any means.

@rchl
Copy link
Member

rchl commented Sep 29, 2024

Edit: #2190 (comment) says current JSON parsing takes 0.293 seconds, so there is probably an error in your reported times? Should it be seconds instead of usec?

The output of the test is actually in seconds.
I guess I got confused because according to the documentation, the CLI variant of timeout outputs in usec.

@rchl rchl merged commit 373e3bb into sublimelsp:main Sep 30, 2024
8 checks passed
@deathaxe deathaxe deleted the feat/orjson-transport branch September 30, 2024 18:53
rchl added a commit that referenced this pull request Oct 2, 2024
* origin/main:
  Use orjson to de-/serialize json-rpc messages (#2513)
rchl added a commit that referenced this pull request Oct 2, 2024
* fix/remove-generators:
  Use orjson to de-/serialize json-rpc messages (#2513)
  remove unnecessary use of generators for session retrieval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants